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\ .. 

'  ^Least  squares  estimators  of  regression  coefficients  can  be  * 
overly  sensitive  to  violations  of  certain  error  assumptions ;  e.g. , 
outliers  in  the  response  variable.  One  solution  to  the  presence 
of  outliers  in  a  data  base  is  to  apply  univariate  robust  estima¬ 
tion  procedures  to  the  residuals  of  estimated  models.  Equally 
prdblemmatic  as  outliers  among  the  response  variable  are  outliers 
or  aberrant  values  for  the  predictor  variables.  Extreme  values 
on  individual  predictor  variables  or  an  unusual  combination  of 
predictor  variable  values  for  a  few  observational  units  can  dis¬ 
tort  least  squares  estimators  even  if  the  error  assumptions  are 
valid.  This  article  discusses  robust  regression  procedures,  with 
special  emphasis  on  techniques  which  are  resistant  to  extreme 
predictor  variable  values 
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1 .  INTRODUCTION 

The  adequacy  of  least  squares  estimators  of  regression  co¬ 
efficients  is  critically  dependent  on  model  specification  and 
model  assumptions.  Although  least  squares  estimators  possess 
powerful  theoretical  properties  (e.g.,  Seber  1977,  Chapter  3)  and 
maintain  relative  insensitivity  to  some  violations  of  model  as- 
sumptions  (e.g.,  Box  and  Watson  1962),  certain  model  anomalies 
such  as  outliers  cam  severely  distort  least  squares  estimates 
(e.g.,  Gunst  and  Mason  1980,  Section  2.1.3).  Robust  regression 
procedures  are  potentially  useful  for  both  detecting  and  effec¬ 
tively  adjusting  for  outliers. 

Outliers  among  the  response  or  predictor  variables  can  occur 
for  a  variety  of  reasons  including  transcribing  or  coding  mistakes, 
unusual  experimental  conditions,  or  truly  aberrant  data  values. 
With  large  data  sets  it  is  often  difficult  to  detect  one  or  a 
few  outliers,  particularly  if  they  cluster  in  the  same  region 
of  the  (p+1) -dimensional  space  of  response  and  predictor  variables. 
Vet  their  intact  on  coefficient  estimates  can  be  catastrophic  if 
the  outliers  lie  in  strategic  corners  of  the  space  of  response 
and  predictor  variables.  For  these  reasons,  adaptation  of  tra¬ 
ditional  (e.g.,  maximum  likelihood)  estimation  procedures  which 
could  provide  protection  against  outliers  are  a  current  focus 


of  research  activity. 


In  this  article  only  Huber's  version  of  M-estimation  will  be 
investigated.  Other,  variants  of  robust  regression  procedures 
have  been  proposed.  For  example,  Andrews  (1974)  explores  M- 
estimation  utilizing  a  trigonometric  weighting  function  on  the 
residuals.  Rupert  and  Carroll  (1980)  and  Koenker  and  Bassett 
(1978)  use  regression  quantiles  and  trimmed  residuals  to  obtain 
robust  regression^estimators .  Xman  and  Conover  (1979)  adopt 
rank  transforms  on  the  response  and  the  predictor  variables  in 
order  to  reduce  the  impact  of  outliers  on  the  prediction  of  the 
response  variable.  Finally,  Askin  and  Montgomery  (1980)  discuss 
the  combination  of  robust  and  biased  regression  estimators  to 
simultaneously  combat  the  ill  effects  of  outliers  and  of  multi - 
collinearities  among  the  predictor  variables. 

The  sections  which  follow  develop  the  need  for  robust  re-  , 
gression  procedures  and  suggest  methods  which  can  compensate  for 
outliers  in  the  response  or  the  predictor  variables.  Section  2 
of  this  article  outlines  robust  M-estimation  for  regression  models. 
Section  3  discusses  influence  functions  and  their  role  in  the 
assessment  of  robustness  properties  of  estimators.  In  this  section 
both  least  squares  and  M-estimators  jure  shown  to  be  affected  by 
predictor  variable  outliers.  Several  proposals  for  detecting 
outliers  among  the  predictor  variable  values  and  for  adjusting 
regression  estimators  in  order  to  compensate  for  these  outliers 
are  described  in  Section  4.  Section  5  briefly  discusses  outlier- 


induced  multicollinearities.  A  detailed  example  is  given  in  Sec¬ 
tion  6  and  concluding  remarks  are  made  in  Section  7. 

2.  PRELIMINARIES 

Write  a  multiple  linear  regression  model  as 

y  «  bqi  +  xs.  +  £  ,  (2.D 

where  Y  is  an  n-dimensional  vector  of  observable  variables,  _1 
is  a  vector  of  ones,  X  is  a  centered  (X'.l -  0)  full-column-rank 
matrix  of  observations  on  p  nonstochastic  predictor  variables, 

Bq  and  B_  are  the  unknown  constant  and  p -dimensional  vector  of 
regression  coefficients,  respectively,  and  e_  is  an  unobservable 
random  error  vector.  Least  squares  estimators  of  the  parameters 
in  model  (2.1)  are  obtained  by  minimizing 
n 

l  P(r  )  ,  (2.2) 

i-1  1 

2  - 

where  p (r . )  =  r.  and  r.  =  Y,  -  6.  -  u!  6  is  the  ith  fitted  resi- 
11  1  1  o  — 1  — 

dual  based  on  the  estimators  B.  and  B  (u!  is  the  ith  row  of  X) . 

0  —i 

Since  p(*)  is  differentiable  one  can  easily  show  that  minimiza¬ 
tion  of  (2.2)  is  equivalent  to  solving  the  following  system  of 
(p+1)  homogeneous  equations  (the  "normal  equations") : 
n  n 

l  i|»(r.)  -  0  ,  £  X.  .’Mr.)  =  0  j  -  1,2, . . .  ,p  (2.3) 

i-1  1  i-1  3 

where  i|>(t)  -  dp(t)/dt  a  t.  The  resulting  least  squares  estimators 


are 
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8Q  -  Y  and  0.  -  (X'XJ^X'Y.  (2.4) 

2 

If  e^NI D(Q,a  ),  the  least  squares  estimators  are  maximum  like- 

2 

lihood  estimators  since  p(e)  *  -2a  ln[f  (e) ]  +  c,  where  £  (e)  is 

a  a 

2 

the  density  function  for  a  N(0,o  )  variate  and  c  is  a  constant 
which  does  not  depend  on  8Q  and  6_ 

Robust  M-estimators  seek  to  reduce  the  influence  of  aberrant 
response  values  while  retaining  an  equivalence  with  maximum  like¬ 
lihood  estimators  when  no  such  wild  response  values  occur.  This 
is  accomplished  by  selecting  a  function  p(*)  which  will  leave 
"typical"  residuals  unchanged  but  will  lessen  the  influence 
of  large  residuals  on  the  solution  of  eqns.  (2.3).  Most  M- 
estimation  procedures  require  that  p(*)  be  convex,  nonmonotone, 
and  that  it  possess  a  bounded,  continuous  derivative  i|>(*).  The 
convexity  and  monotonicity  properties  are  imposed  to  insure 
unique  solutions  while  the  boundedness  and  continuity  of  «f»  ( • ) 
insure  that  the  estimator  cannot  be  dominated  by  an  extremely 
large  residual  (boundedness)  and  that  small  changes  in  residuals 
cannot  produce  large  changes  in  the  resulting  estimates  (con¬ 
tinuity)  .  Existence  of  higher-order  derivatives  of  p ( » )  are 
desirable  for  theoretical  derivations  of  asymptotic  properties 
of  M-estimators. 

Huber  (1964)  popularized  the  use  of  a  robust  M-estimator 
which  can  be  defined  in  terms  of  the  following  function  p(*): 
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Dutter  (1975,  1977)  and  Huber  (1981)  elect  not  to  minimize  eqn. 
(2.2)  but  instead  choose  to  minimize 
n 

n"  £  [p(r,/o)  +  a]c  ,  (2.7) 

i-1 

or,  equivalently,  to  solve  the  following  system  of  (p+2)  equa¬ 
tions 

n  n 

l  ^(r./o)  -  0  ,  l  X  *(r,/n)*0  j-l,2,...,p  (2.8) 

i«l  i-1  3 

and 

,  n 

n  £  x(x./a)  -  a  ,  (2.9) 

i-1  1 

where  x(t)  -  ti|;(t)  -  p  (t) .  Equations  (2.3)  and  (2.8)  are  iden¬ 
tical  if  r^  is  replaced  in  the  former  set  by  r^/a.  Since  the 
residuals  are  standardized  in  eqns.  (2.8)  by  an  estimate  of 
scale,  the  value  of  c  in  eqn.  (2.6)  need  not  depend  on  a  and  is 

often  chosen  to  be  1.5.  The  value  of  a  is  selected  so  eqn.  (2.9) 

2 

will  yield  a  consistent  estimator  of  a  when  e  ^  N(0,a  );  viz.; 


a-  (n-p-l)E[x[e/c) ]/n. 

Iterating  with  eqns.  (2.8)  and  (2.9)  is  relatively  straight¬ 


forward.  Let  denote  the  estimates  of  0/  -  (3Q,£' )  obtained 

~2 

on  the  kth  iterate  and  let  a ^  denote  the  corresponding  estimate 

2  2 
of  a  .  From  eqn.  (2.9),  a  new  estimate  of  a  is 


a2 (k+1) 


(na)"1  l 
i-1 


x(ri/0(k))a(k) 


(2.10) 


where  for  ease  of  notation  we  let  r.  denote  the  ith  residual 
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obtained  from  the  kth  iteration.  By  letting  $(t)  ■  iji(t)/t,  eqns. 
(2.8)  can  be  rewritten  as 


l  <C(r./o)*  (r  /a)  =0  ,  £  X  <Mr./a)  •  (r./o)  «  0  (2.11) 


i=l 


i=l 


or  as 


*  -1 
0  =  (Z'$Z)  Z'$Y 


(2.12) 


where  Z  =  [1,X]  and  $  =  diag($  (r  /o) _ _ _  4>  (r  /a)).  Equation 

—  in 

(2.12)  is  simply  a  weighted  least  squares  estimator  of  9^  in  which 

the  stochastic  weights  are  <J>  (r^/cr) .  Using  the  residuals  from 

~2 

the  kth  iteration  and  a,,  ...  from  eqn.  (2.10),  0  ..is  found 

(k+1)  —(k+1) 

from  this  weighted  least  squares  estimator. 

Based  on  the  foregoing,  iterative  estimation  of  the  para¬ 
meters  of  model  (2.1)  cam  be  based  on  the  following  sequence  of 
steps: 


1.  Obtain  initial  estimates  of  BQ  and  £  from  eqns.  (2.4) 
or  from  one  of  the  estimators  proposed  in  Section  4, 

2.  Use  either  the  least  squares  estimate  of  a  or  some 

robust  estimate  of  scale;  e.g.,  a  =  {median | r^ | }/. 6745, 

* 

where  r^  =  r^-mediantr^},  (Andrews,  et  al.  1972), 

~2 

3.  Calculate  a  from  eqn.  (2.10)  using  c  =  1.5, 

4.  Update  the  estimates  of  Bg  amd  B_  with  the  weighted 
least  squares  estimator  (2.12), 

5.  Repeat  steps  3  amd  4  until  satisfactory  convergence 


is  reached. 
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This  algorithm  for  finding  robust  regression  estimates  pro¬ 
vides  good  protection  against  aberrant  response  (error)  terms. 
Reasons  for  this  protection,  apart  from  the  informal  discussions 
given  above,  can  be  readily  appreciated  by  examining  the  influence 
functions  corresponding  to  least  squares  and  M-estimators.  At 
the  same  time,  the  lack  of  "resistance"  of  both  of  these  esti¬ 
mators  to  outliers  in  the  predictor  variables  can  be  seen  from 
the  influence  functions.  We  now  turn  to  this  more  formal 
evaluation  of  the  sensitivity  of  regression  estimators  to  vio¬ 
lations  of  model  assumptions . 

3 .  INFLUENCE  FUNCTIONS 

Hampel  (1968,  1974)  introduced  the  use  of  influence 
functions  for  studying  robustness  properties  of  estimators. 

The  local  behavior  of  an  estimator  in  a  neighborhood  of  the 
assumed  underlying  distribution  is  studied  by  first  expressing 
the  estimator  as  a  functional  on  a  space  of  probability  dis¬ 
tributions.  Then  the  influence  function  of  the  estimator  is 
defined  to  be  the  derivative  of  the  functional  evaluated  at  the 
assumed  distribution.  Not  only  can  idealized  or  "parametric** 
influence  functions  be  defined  in  this  manner  but  empirical 
influence  functions  can  also  be  defined  in  terms  of  enqsirical 
distribution  functions.  Before  turning  to  regression  models, 
these  concepts  will  be  illustrated  on  a  simple  location  model. 


Let  T (F)  denote  a  real-valued  functional  defined  on  a 

subset  of  probability  distributions,  FeF.  For  example,  the 

mean  functional  can  be  defined  as 

/ (x-T(F) )dF(x)  =  0  ,  (3.1) 

yielding  T(F)  =  u  =  /x  dF(x).  If  F^  is  an  empirical  c.d.f. 

based  on  a  random  sample  of  size  n  from  F,  an  estimator  of  T(F) 

can  be  derived  from  eqn.  (3.1)  as 

/ (x-T (F  ) )dF  (x)  =  0  (3.2) 

‘  n  n 

A  ^  ri 

or  T(F  )  =  u  =  n  )x..  The  functional  T(F)  can  be  viewed 
n  41  i 

either  as  a  parametric  analogue  to  the  finite-sample  estimator 

(3.2)  or  as  a  limiting  estimator  for  very  large  sample  sizes. 

Consider  next  the  effect  of  an  outlier,  xQ,  on  T (F)  and 

T(F  ).  In  the  space  of  probability  distributions  an  outlier 
n 

can  be  modeled  as  a  mixture  distribution 

F°(x)  =  (l-a)F(x)  +aHQ(x),  0  <_  a  <_  1  (3.3) 

where 

V*>  =  50(t)*dt 

and  S q ( t )  is  a  probability  density  function  for  the  contaminant. 
For  the  remainder  of  this  section  we  will  assume  that  6Q(t) 
assigns  point  mass  to  xQ.  Using  this  contaminated  (point 
mass)  distribution  function  the  influence  of  Xq  on  the  estimator 
can  be  assessed. 

A  measure  of  the  impact  of  an  outlier  xQ  on  the  estimator 
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T(F)  is  the  "influence  function"  which  is  defined  to  be 


•  T(F  )-T(F) 

T(F)  -  lim+  —  - 

a-+Q  a 


where  T(F)  cam  be  viewed  as  a  directional  derivative  of  T (F)  in 


the  direction  of  xQ  if  the  limit  exists  and  is  unique  as  the 
limit  is  taken  from  positive  and  negative  directions.  An  em¬ 


pirical  influence  function  can  be  defined  in  a  similar  fashion 


simply  by  replacing  F  with  F  in  eqn.  (3.4). 

n 

Contamination  of  the  assumed  distribution  by  the  outlier 
xQ  distorts  the  estimator.  For  large  samples  T(F°)  ««  y  +  a(xQ-u) 

e 

and  T(F)  =  xQ-u.  Thu3  the  influence  (distortion)  of  the  esti¬ 
mator  is  proportional  to  xQ-u .  For  finite  samples, 

T(Fa)  =*  x  +  a(x  -x  )  and  T (F  )  =  x  -x  ,  where  x  *  n  ^Tx.  . 
nn  on  non  n  r 

Note  that  in  either  case  the  influence  functions  are  unbounded 


functions  of  the  contaminant  xQ;  consequently,  a  single  gross 
outlier  can  have  a  devasting  effect  on  the  estimator  even  if  the 
outlier  occurs  with  relatively  small  likelihood  (a) . 

These  results  contrast  with  robust  M-estimation  in  that 
the  latter  estimators  possess  bounded  influence  functions  and 
thereby  limit  the  distortion  am  outlier  can  cause.  Robust  M- 
estimator  functionals  in  the  location  model  satisfy  the  equation 
j>(x-T(F))d F(x)  =  0  ,  (3.5) 

which  reduces  to  eqn.  (3.1)  when  ^(t)  =  t.  Replacing  F(x) 
a 

by  F  (x) ,  differentiating  eqn.  (3.5)  implicitly,  and  evaluating 
the  derivative  at  a  *  0  yields 
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.  l|)  (x  -T(F) ) 

T(F)  *  - ; - - -  ,  (3.6) 

/  lMx-T(F))dF(x) 

• 

where  i(; (t)  =  di|/(t)/dt.  The  influence  function  (3.6)  is  propor¬ 
tional  to  <(>(•)  and  is  thereby  a  bounded  function  of  xQ.  Analogous 
properties  hold  for  the  empirical  influence  function. 

Turning  now  to  the  regression  model  (2.1),  the  regression 
functional  can  be  written  as 

/z' (Y-XT(F) )dF(Y)  =  0  ,  (3.7) 

where  Z  =  [l^X]  and  FW  represents  the  c.d.f.  of  a  multivariate 

2 

normal  density  function,  Y  \  N(Z6_,c  I)  with  £'*=  (3Q, 6/ )  •  This 
functional  can  be  rewritten  as 

T(F)  =£  =  (Z'ZI^Z’/y  arm.  (3.8) 

It  is  important  to  realize  that  the  response  vector  Y  represents 
a  single  observation  from  this  multivariate  normal  distribution 
and  not  n  independent  observations  from  a  univariate  distribution. 
Thus  an  appropriate  contaminated  distribution  for  this  functional 
is 

Fa(Y)  -  (l-a)F(Y)  +  aHQ(Y)  (3.9) 

where  HQ(Y)  is  a  c.d.f.  for  the  contaminated  distribution  of  an 

n-dimensional  outlier  Y^  =  Z0_  +  e^.  The  error  does  not  follow 
2 

the  assumed  N(0,c  I)  distribution  and  could  be  partially  or  com¬ 
pletely  deterministic.  Single  response  outliers  can  be  modeled 

by  defining  (n-1)  of  the  elements  of  to  have  the  assumed 
2 

NID (0,c  )  error  distribution  and  the  remaining  one  to  have  a 
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different  distribution  (perhaps  deterministic) .  The  influence 
function  corresponding  to  eqns.  (3.7)  and  (3.9)  is 

T(F)  «  (Z'Z)'3^' (Y,-Z0) 

—  -o  — 

-  (Z* Z) ~1z* .  (3.10) 

This  influence  function  reveals  that  the  distortion  in  the 

functional  (3.8)  is  proportional  to  the  error  vector,  e  -Y  -ze, 

^0  -O  — 

and  is  an  unbounded  function  of  the  elements  of  the  contaminant 
1(3 •  Thus,  as  in  the  location  model,  regression  estimators  can  be 
severely  distorted  by  gross  outliers. 

Corresponding  to  the  functional  (3.8),  a  regression  func¬ 
tional  for  a  robust  M-estimator  is 

/z'y(Y-ZT(F))dF(Y)  «  0  ,  (3.11) 

where  ¥(t)  ■  (i|»(t, ) , . . .  ,\|»(t  ))'  for  some  robust  ty(*) -function. 

-  1  n 

Using  eqn.  (3.9)  as  the  contaminated  distribution  produces  the 

following  expression  for  the  influence  function  T(F) : 

Z'/$(Y-Z6)dF(Y)ZT(F)  *  Z’^tY^-ZS)  ,  (3.12) 

—  —  —  ^  —  • 

where  V  (t)  =  diagW  (t, ) , . . .  ,<Mt  )).  As  with  eqn.  (3.6)  for  the 
location  model,  the  influence  function  in  eqn.  (3.12)  is  propor¬ 
tional  to  X(^o ~z—  3111(3  is  therefore  a  bounded  function  of  the  elements 
of  Yg.  A  similar  derivation  for  the  empirical  influence  function  (Y) 
yields 

Z'  ¥  (Y-Z9)  ZT(F.  (Y) )  *  Z'^fY^-Ze) 

—  —  —  1—  —  — 0  — 


or 
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T(Fl(Y))  =  [ Z '  f  ( Y- ze^) Z ] ~  1z ' ^(y^-ZQ)  ,  (3.13) 

where  6_  is  the  robust  M-estimator  (2.11). 

Hansel  (1974)  and  Huber  (1981)  justify  the  need  for  bounded 
influence  functions.  Intuitively,  the  above  derivations  show 
that  estimators  can  be  severely  distorted  by  gross  contaminants 
even  if  their  likelihood  of  occurrence  is  small.  Robust  M- 
estimators  bound  the  influence  functions  and  thereby  limit  the 
change  which  an  errant  point  can  produce  in  an  estimator.  Of 
special  importance  is  the  safeguard  that  robust  M-estimators 
provide  against  catastrophic  distortions  by  outliers  in  the 
response  variable  for  either  location  or  regression  models. 

Although  robust  M-estimators  provide  protection  from 
contaminated  distributions  for  the  response  variable,  it  should 
be  apparent  from  eqns.  (3.12)  and  (3.13)  that  no  specific  pro¬ 
tection  is  offered  for  aberrant  predictor  variable  values.  That 
outliers  in  the  predictor  variables  is  as  insidious  a  problem 
as  outliers  in  the  response  variable  can  be  illustrated  with  a 
simple  example.  Supppose  eqn.  (2.1)  represents  a  single-variable, 
no-intercept  model:  Y^  =  BX^^  +  e^.  Rewrite  eqn.  (2.3)  as 

.  n 

|J((Y1-X1B)  +  x~  l  X^^-X^)  *  0. 
i“l 

If  the  response  variables  are  held  fixed  and  X^-*®  ,  the  second 
term  of  this  equation  is  driven  to  zero.  Consequently,  eqn.  (2.3) 


reduces  to 
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*(Y  -*i&)  -  0 

which  has  a  solution  6  «  X^1Yl  711118  re9ardless  of  the 

value  of  0,  the  estimator  of  0  approaches  0  for  both  least 
squares,  iji(t)  ■  t,  and  for  any  robust  M-estimator  possessing 
the  properties  described  in  Section  2;  in  particular,  nonmonotonic 
continuous  <(>(*) -functions  for  which  tHO)  *  0,  including  eqn.  (2.6). 

This  example  illustrates  that  a  single  errant  predictor 
variable  value  can  have  as  catastrophic  an  effect  on  estimation 
of  parameters  for  regression  models  as  can  outliers  in  the  res¬ 
ponse  variable .  The  next  section  examines  several  proposals 
for  dealing  with  aberrant  predictor  variable  values. 

4.  PROPOSED  SOLUTIONS 


A  natural  solution  to  the  problem  of  outliers  in  the 
predictor  variables  is  to  weight  each  predictor  variable  in 
a  fashion  similar  to  M-estimation  on  the  response  variable. 
Accordingly,  one  could  replace  X^  by  ijp  (X_)  ,  where 


r  (X^) 


tv 


ij 

sign(X^) 


X.  .  <  c. 

13  -  j 


IV 


>  c . 
J 


(4.1) 


Cj  »  1.53^,  and  s^  is  a  robust  measure  of  scale  for  the  n  observa¬ 
tions  on  Xj.  One  could  also  center  X^  with  robust  estimate  of 
location  prior  to  forming  the  ^(*). 


Another  proposal  (Mallows  1973,  Denby  and  Larson  1977)  is 
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Detection  of  extreme  predictor  variable  values,  or  combinations 
of  predictor  vairables,  is  the  first  step  in  rectifying  the  estima¬ 
tion  problems  they  produce.  Abnormally  large  or  small  values  for 
individual  predictor  variables  are  relatively  easy  to  detect  from 
(perhaps  robust)  summary  statistics.  For  exanple,  some  computer 
programs  automatically  "flag"  observations  which  are  further  than 
two  or  three  standard  deviations  from  the  mean.  Examination  of 
the  weights  ^(X^)  or  (X^)  are  also  useful  for  detecting  out¬ 
liers  in  one  dimension.  For  outliers  in  two  or  more  dimensions 
other  techniques  are  needed. 

Hoaglin  and  Welsch  (1978)  popularized  the  use  of  a  matrix 
referred  to  as  the  "hat  matrix"  to  detect  outliers  among  the 
predictor  variables.  The  hat  matrix  is  so  named  because  it 
transforms  the  response  vector  into  the  least  squares  prediction 

A 

vector  Y  -  HY  where 

H  -  ZtZ'Z)”^' 

-  n^ll/  +  X(X'X)"1X’  .  (4.3) 

Diagonal  elements  of  the  hat  matrix  are 

hu  -  n’1  +  u^X'X)-1^  (4.4) 

where  the  quadratic  form  in  u.  represents  a  (squared)  Mahalanobis 
distance  of  the  ith  row  of  X  from  the  centroid  of  the  predictor 
variable  space.  Large  values  of  h^  indicate  rows  of  X  which  lie 
in  extreme  regions  of  the  observed  predictor  variable  space.  Ano- 
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malous  values  on  one  or  more  predictor  variables  can  be  detected 


by  the  h^.  Since  the  predictor  equation  for  the  ith  response 


variable  can  be  written  as 

A 

Y 


«  h. .Y.  +  y  h.  .  Y.  , 
i  ii  i  jji  *3  3 


(4.5) 


the  h^  are  a  direct  measure  of  the  relative  importance  of  Y^ 
in  predicting  its  own  value.  Due  to  the  importance  of  the  diagonal 
elements  of  H  in  detecting  multidimensional  outliers  and  assessing 

A 

the  influence  of  Y  on  Y^,  they  have  been  termed  "leverage  values." 

The  hat  matrix  H  is  idempotent;  consequently,  the  leverage 
values  are  constrained  to  the  interval  [0,1].  The  more  extreme  a 
row  of  X  is  relative  to  the  other  rows  of  X,  the  closer  the  cor¬ 
responding  leverage  value  is  to  1.  For  example,  if  model  (2.1) 
contains  a  single  predictor  variable 

n 


h..  *  n_X  +  (X  -X)2/  l  (X.-X) 
11  1  k-1  K 


r..2 


Observe  that  if  X^  =  X,  =  n  1  but  if  X^^  is  very  large  in 


magnitude 


h. .  =>  n  1  + 

IX 


(1  -  x'1*)2 


(1  -  X  .X)2  +  T  (x-1x.  -  x"1  X)' 

1  j^i  1  3  1 


-1 

-  n  + 


,,  -1,2 

(1  -  n  ) 


-12  -12 
(1  -  n  )  +  (n  -  1) (-n  V 


=  1  . 


In  the  previous  section  it  was  shown  that  as  X^  -*•  «  the  least 

A 

squares  estimator  6  approaches  zero.  From  eqn.  (4.5)  or  by 


directly  evaluating  Y^  ■  X^S  one  can  show  that  Y^  -*■  Y^  as  X^  ». 
In  general/  if  u^  is  a  single  outlier  in  X  the  corresponding 


predicted  response  will  be  almost  uniquely  determined  by,  and 

equal  to,  its  observed  response.  Conconoitant  with  near  perfect 

prediction  of  Y^  when  -  1  will  often  occur  severe  distortion 

of  one  or  more  of  the  coefficient  estimates. 

Multivariate  outliers  are  detectable  not  only  by  large 

leverage  values  but  also  in  the  normalized  principal  components 

of  X.  Let  Xc  denote  the  standardized  (X'X  is  in  correlation  form) 
b  s  s 

matrix  of  predictor  variables.  Further,  let  l ,  <  <  ...<  I 

1  z  —  p 

denote  the  latent  roots  of  X  'X  and  V  ,  V  , . . . ,V  the  corresponding 

S  a  i  Z  —p 

latent  vectors.  The  jth  normalized  principle  component  of  X  is 

m.  *  X  V . .  An  extreme  row  of  X  causes  an  elongation  of  one 
-j  D 

of  the  component  axes  and  a  large  element  in  the  corresponding 
normalized  principle  component.  Since  the  component  vectors  are 
mutually  orthonormal,  univariate  weights  such  as  eqn.  (4.1)  or  (4.2) 
could  prove  effective  in  obtaining  estimators  of  the  principal 
component  coefficients  y ^  »  4^  V^'  B  which  are  resistant  to  out¬ 
liers  in  the  predictor  variables.  Inverse  transformations  could 
then  produce  resistant  estimators  of  the  0  ^ . 

Another  alternative  to  the  above  proposals  is  a  direct 
weighting  of  the  rows  of  X.  Consider  weighting  the  kth  row  of  X 
by  a  factor  u>.  ,  where  0  <_  u>  <  1.  Then  model  (2.1)  is  replaced 


by 
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As  a  function  of  and  the  original  leverage  values,  the  kth 
leverage  value  of  is 

*WV  '  "'l  *  ‘“kk  -  "'1)t2/!1-t1lhkk-n'1)1  (4-81 

+  n  1(1  -  2  and  t2  *  n  1 (1 +  (n  -  Dw^) • 

For  large  sample  sizes  t^  =  1  -  and  t2  =  cu^,  yielding  an  approxi¬ 
mation  to  eqn.  (4.8)  : 


where  t^  *  (1  -  oi^) 


'WV  “ 


w,  h 
k  kk 


1_  (1‘“k)hkk 


Note  that  for  large  sample  sizes  and  <  1,  h^fu^)  <  h^? 
moreover,  algebraic  manipulation  of  eqn.  (4.8)  allows  one  to 
verify  that  the  same  property  holds  for  all  sample  sizes. 

Equation  (4.9)  provides  a  rationale  for  selecting  a  value  of 
oa^.  Suppose  one  wishes  to  fix  the  leverage  value  of  the  kth  row 
of  Xn  to  be  a  suitably  small  or  moderate  value  n,  n  1  n 
By  setting  h^fo^)  *  0  in  eqn.  (4.9)  one  can  solve  for  a  value 


«(i  :  hkk) 

h, .  (1  -  n) 


(4.10) 


Note  that  setting  n  m  n  ^  yields  a  0  for  moderate  to  large 
sample  sizes  ;  i.e.,  setting  n  »  n  1  in  eqn.  (4.10)  results  in 
replacement  of  u£  by  0.’  (approximately).  Similarly,  setting 
n  ■  h^  yields  *  1?  i.e.,  u£  is  left  unchanged  in  X. 

To  illustrate  the  effect  of  this  type  of  weighting  scheme. 


let  us  return  to  the  single-variable,  no  intercept  model  and 


above.  The  theoretical  results  are  far  more  complex  and  iterative 
schemes  need  to  be  developed  in  order  to  solve  for  weights  which 
will  enable  two  or  more  leverage  values  to  be  simultaneously 
satisfied.  Although  cruder  and  only  an  approximation,  a  simpler 
approach  to  situations  in  which  two  or  more  outliers  are  present 
would  be  to  use  eqn.  (4.10)  as  a  guide  to  an  initial  specification 
of  weights  and  then  alter  the  weights  jointly  until  a  satisfactory 
combination  of  leverage  values  is  attained. 

5.  OUTLIER- INDUCED  MULTICOLLINEARITIES 

Observations  which  possess  very  large  values  on  two  or  more 
predictor  variables  can  induce  multicollinearities  among  the 
predictor  variables.  Unlike  the  usual  situation  in  which  all 
observations  conform  to  the  multicollinearity ,  an  outlier-induced 
multicollinearity  is  an  artifice  of  the  outliers  and  not  a  true 
indication  of  a  redundancy  among  the  predictor  variables.  Dele¬ 
tion  of  the  outliers  from  the  data  base  destroys  this  type  of 
multicollinearity. 

The  effects  of  an  outlier-induced  multicollinearity  on  a 
regression  analysis  are  similar  to  those  resulting  from  a  true 
multicollinearity.  Coefficient  estimates  tend  to  be  too  large 
in  magnitude,  their  signs  tend  to  be  determined  by  the  multi¬ 
collinearity  itself  and  not  the  true  relationship  between  response 
and  predidtor  variables,  and  the  variances  of  coefficient  estima- 


tors  for  multicoil inear  predictor  variables  can  be  orders  of  mag¬ 
nitude  larger  than  if  the  predictor  variables  were  not  multi- 
collinear.  For  example,  if  eqn.  (2.1)  represents  a  two-variable, 
no-intercept  model  the  estimating  equations  (2.3)  become 

j  X  ♦«!-»  Xii-S2Xi2)  -  0  3-1.2.  (5.1) 

1  =  1  J 

If  one  now  lets  00 ,  j  =  1,2,  and  restricts  X^  and  X^  so 

X^^/X^^  =  1,  eqns.  (5.1)  reduce  to 

'HYl“(ei+62)Xll>  =  0  ,  (5.2) 

which  has  as  solutions  B^+B.^  =  Yi//Xn  The  solution 

of  eqn.  (5.2)  forces  8^  =  -  &2  but  B^  and  B2  can  have  almost  any 
magnitude,  regardless  of  the  true  values  of  B^  and  82-  This 
type  of  ambiguous  solution  is  characteristic  of  least  squares 
estimation  when  predictor  variables  are  multicollinear .  Since 
eqns.  (5.1)  and  (5.2)  are  also  estimating  equations  for  M- 
estimators,  robust  M-estimation  can  also  suffer  ill-effects  of 
outlier-induced  multicollinearities. 

Biased  estimation  is  frequently  offered  as  a  solution  to 
estimation  with  multicollinear  predictor  variables.  An  attractive 
alternative  to  biased  estimation  when  multicollinearities  are 
caused  by  a  few  outliers  is  robust  estimation;  however,  it  should 
be  apparent  from  the  above  example  that  the  robust  procedures 
must  be  resistant  to  predictor  variable  outliers. 


i 


T 


As  with  the  single-variable  example  in  the  previous  section, 
weighting  a  single  outlier  provides  protection  against  the  domina¬ 
tion  of  the  estimator  by  the  outlier.  For  this  two-variable 
example  the  weighted  M-estimator  is  obtained  from  the  equations 

j  -  °  I-1'2  > 


i=l 


where  .)  *X.  .  for  i  ^  1  and  4;  (X  .) 

fi  lj  13  n  lD 


“Jv 


If  X 


lj 


“  with 


Xll/X12 


1, 


^<Xlj>  =  Vlj 


j  =1,2 


(I*X^)  <I*xJ2>  -  (E*Xil 

Z*X?,  +  I*X2„  -  2£*X. , 
ll  i2  il 


2  \  1/2 
*i2>  ) 

Xi2  | 


where  E*  indicates  that  the  summation  is  for  all  i  /  1.  Note  in 
particular  that  as  X^  -*•  <*>  with  x11/x^2  =  ^'  ^n^Xlj^  is  bounded- 
Thus  M-estimation  with  this  resistant  weighting  cannot  be  domi¬ 
nated  by  the  outlier-induced  multicollinearity. 

Another  facet  of  outlier-induced  multicollinearities  is  that 
multicollinearities  can  actually  be  strengthened  when  any  of  the 
resistant  procedures  suggested  in  the  previous  section  are  used. 
In  fact,  occasionally  one  induces  a  multicollinearity  where  none 
previously  existed  by  weighting  the  predictor  variable  values. 
Extreme  care  must  be  exercised  when  these  procedures  are  used; 
in  particular,  one  should  always  examine  the  latent  roots  and 


latent  vectors  of  the  correlation  matrix  of  predictor  variables 


variance  inflation  factors,  etc,  to  determine  whether  multicolli- 
nearities  have  been  induced  or  strengthened  by  the  weighting  of 
predictor  variable  values.  The  example  discussed  in  the  next 
section  illustrates  this  point. 

6.  GASOLINE  MILEAGE  DATA 

Hocking  (1976) ,  as  part  of  an  important  and  extensive  survey 
of  variabel  selection  techniques,  utilized  a  set  of  data  on 
gasoline  consumption  to  illustrate  the  procedures  he  discussed. 

The  original  data  set  consists  of  a  response  variable,  gasoline 
mileage  (MPG)  and  ten  predictor  variables  for  each  of  32  auto¬ 
mobiles,  the  data  taken  from  several  issues  of  "Motor  Trend" 
magazine.  The  ten  predictor  variables  are  engine  shape  (SHAPE), 
number  of  engine  cylinders  (CYL) ,  automatic  or  manual  transmission 
(AM) ,  number  of  transmission  speeds  (GEAR) ,  engine  size  (SIZE) , 
engine  horsepower  (HP) ,  number  of  carburetor  barrels  (CARB) , 
final  drive  ratio  (DRAT) ,  weight  (WT) ,  and  quarter  mile  time 
(TIME) .  Henderson  and  Velleman  (1981)  critize  the  use  of  MPG 
as  the  response  variable,  preferring  to  use  GPM=  (MPG)  1,  and 
suggest  that  the  preponderance  of  sports  cars  in  the  data  base 
would  make  RATIO  =  HP/WT  a  potentially  valuable  addition  to  the 
set  of  predictor  variables.  After  eliminating  several  predictor 
variables  which  do  not  appear  to  aid  in  the  prediction  of  the 
response  variable,  we  decided  to  illustrate  the  procedures 


discussed  in  the  previous  sections  by  regressing  GPM  on  CYL,  HP, 
DRAT,  WT,  AM,  GEAR,  and  RATIO. 

After  examing  various  plots  of  the  data  to  insure  that  no 

further  transformations  were  apparent,  several  statistics  were 

calculated  for  each  observation  as  an  aid  to  the  detection 

of  possible  outliers.  Table  1  displays  these  statistics  along 

with  a  list  of  the  automobiles  included  in  the  data  set.  The 

leverage  values,  eqn.  (4.4) ,  for  the  complete  data  set  are 

shown  in  the  second  column  of  the  table.  The  Lotus  Europa, 

h.  .  *  0.872,  and  the  Maserati  Bora,  h.  .  =0.681,  have  the  largest 
ii  ii 

leverage  values  in  the  data  set,  both  greatly  exceeding  Hoaglin 
and  Welch's  (1978)  rough  cutoff  of  2 (p+l)/n = 0. 5. 

[insert  Table  l] 

Also  displayed  in  Table  1  are  studentized  deleted  residuals 
t(-i)  Gunst  and  Mason  1980,  Section  7.1.3).  These  statis 

tics  measure  the  difference  between  an  observed  response  Y^  and 
its  predicted  value  Y^_^  which  is  obtained  using  least  squares 
coefficient  estimates  derived  from  the  other  (n-1)  observations. 
Let  SSE  denote  the  residual  sum  of  squares  from  the  fit  to  GPM 
using  all  32  observations.  Then  the  ith  studentized  deleted 


residual  is  calculable  as 


Yi~  Vi) 

(_1)  {  Var[Y  (_i}  ]  }*5 


(1  - 


h.  .) 


11 


a 


(-i) 


*2  -1  2 

.  where  (n-p-2)a,  ..  =  SSE  -  (1-h..)  r.  .  Individually  the  t,  .. 

(-1)  n  l  (-1) 

follow  a  student  t  distribution  with  (n-p-2)  degrees  of  freedom. 

Although  the  t^  ^  are  correlated,  t  tables  can  be  used  to 

furnish  useful  cutoff  values  for  the  detection  of  outliers. 

The  Chrysler  Imperial  has  the  largest  studentized  deleted 

residual  in  Table  1.  Its  value,  t,  =  -4.000,  places  this 

(-1) 

statistic  in  the  extreme  lower  tail  of  the  corresponding  t 
distribution  and  warrants  a  close  examination  of  the  Chrysler 
as  a  possible  outlier.  The  Cadillac  Fleetwood  and  Pontiac 
Firebird  have  moderately  large  studentized  deleted  residuals 
but  are  not  so  unusually  large  to  be  of  concern  in  a  sample 
of  32  observations. 

Another  statistic  which  will  be  examined  as  an  aid  in  the 

deletion  of  outliers  is  Cook's  (1977)  distance  measure.  Let 

9 ,  . ,  denote  the  least  squares  estimator  of  0  which  is  calculated 
-*(-i)  - 

from  the  (n-1)  observations  excluding  the  ith  one.  Cook  (1977) 
defines  a  statistic 


D.  ■ 
1 


(p+1) MSE 


which  allows  a  direct  conf>arison  of  the  least  squares  estimator 
from  the  complete  data  set,  6_,  with  the  estimator  calculated 

A 

from  (n-1)  data  points,  .  Although  this  statistic  does  not 

follow  an  F  distribution,  Cook  suggested  that  F  tables  could  still 
provide  useful  cutoff  values  for  the  detection  of  outliers. 

Because  elimination  of  one  observation  from  a  homogeneous  data 
set  should  leave  8_  relatively  unchanged  from  0_,  Cook  further 
suggested  that  a  value  of  which  is  larger  than  a  lower  10%. 

F  value  should  be  carefully  studied  as  a  possible  outlier.  We 
feel  this  criterion  is  often  too  conservative  and  choose  to  use 
a  lower  25%  cutoff  value,  F  22(8,24)  *0.623.  With  this  cutoff 
value  the  Lotus  Europa  is  judged  to  have  a  strong  influence  on 
the  estimation  of  0_.  If  the  lower  10%  F  value,  F  1Q(8,24)  *0.416, 
is  used  the  Chrysler  Imperial  would  also  be  extremely  influential 
on  the  estimation  of  8_. 

The  Cadillac,  the  Lincoln,  and  the  Chrysler  are  the  only 
American-made  luxury  cars  which  are  included  in  this  data  set. 

They  have  very  similar  values  on  the  predictor  variables;  e.g., 
they  all  have  eight  cylinder  engines,  they  are  the  heaviest 
automobiles  in  the  data  set,  etc.  In  fact,  collectively  they 
could  be  considered  outliers  because  of  their  size  relative  to 
the  other  automobiles  in  Table  1.  Yet  individually  their  unusual 
features  tend  to  be  masked  because  they  are  similar  among  them¬ 
selves;  therefore,  they  do  not  induce  individually  large  leverage 


values  in  Table  1.  The  Chrysler  Inqperial  possesses  a  large  stu- 
dentized  deleted  residual  and  moderate-sized  because  its 
gasoline  mileage  (hence,  GPM)  differs  from  the  Cadillac  and  the 
Lincoln.  The  Chrysler 1 s  gasoline  mileage  is  14.7  while  that  of 
the  Cadillac  and  Lincoln  is  10.4  (see  Henderson  and  Velleman  1981, 
Table  1) .  Since  the  Chrysler  is  an  outlier  due  to  an  unusual 
response  value,  M-estimation  should  compensate  for  its  influence 
on  the  fit. 

The  Lotus  Europa  poses  different  problems.  Its  leverage 
value  suggests  that  the  predictor  variables  for  the  Lotus  are 
unusual  and  M-estimation  alone  might  be  unable  to  satisfactorily 
adjust  for  the  ill  effects  of  the  Lotus.  The  Lotus  is  an  outlier 
in  predictor  variable  space  because  of  the  inclusion  of  RATIO 
as  a  predictor  variable.  The  Lotus  has  relatively  small  values 
on  HP  and  WT  but,  unlike  other  automobiles  in  Table  1  which  also 
have  small  values  on  HP  and  WT,  it  possesses  an  unusually  large 
value  of  RATIO.  Other  automobiles  in  Table  1  also  possess  large 
values  on  RATIO  but  they  have  large  values  on  HP  and  WT  as  well. 
The  Lotus  is  a  three-dimensional  outlier  which  the  leverage  values 
have  aided  in  detecting. 

Although  the  Maserati  Bora  also  has  a  large  leverage  value, 
primarily  due  to  its  unusually  large  horsepower,  it  does  not 
have  correspondingly  large  values  of  t  or  D^.  This  suggests 
that  the  Maserati  is  not  unduly  influencing  the  fit. 


The  last  three  columns  of  Table  1  display  the  leverage  values, 
studentized  deleted  residuals,  and  Cook's  distance  values  for  a 
reduced  data  set  in  which  the  Chrysler  and  the  Lotus  are  eliminated. 
Although  the  leverage  value  for  the  Maserati  is  now  considerably 
larger  than  for  the  complete  data  set,  the  t^  ^  and  values 
still  do  not  indicate  that  the  Maserati  is  unduly  distorting  the 
fit.  Scanning  the  other  t^_^  and  values  in  the  last  two 
columns  does  not  lead  one  to  conclude  that  any  other  observations 
in  this  data  set  are  strongly  influencing  the  fit. 

In  order  to  gauge  the  inpact  of  the  Chrysler  and  the  Lotus 
on  the  coefficient  estimates,  least  squares  estimates  for  the 
complete  data  set  are  compared  with  those  for  the  reduced  (n»30) 
data  set  in  the  upper  portion  of  Table  2.  There  are  important 
differences  in  the  significance  (HP,  AM,  RATIO)  and  magnitudes 
(HP,  DRAT,  WT,  AM,  GEAR,  RATIO)  of  the  two  sets  of  estimates. 
M-estimates,  computed  as  described  in  Section  2  using  initial 
least  squares  estimates  of  0_  and  a,  are  displayed  in  the  lower 
portion  of  Table  2.  The  M-estimates  for  the  complete  data  set 
and  the  reduced  base  set  of  observations  are  quite  similar  to 
the  corresponding  least  squares  estimates.  This  is  reassuring 
for  the  base  set  but  suggests  that  M-estimation  for  the  complete 
data  set  has  not  successfully  compensated  for  the  inclusion  of 
the  two  outliers.  One  would  expect  to  see  the  M-estimates  for  the 
complete  data  set  closer  to  the  M-estimates  for  the  base  set 


than  to  the  least  squares  estimates  for  the  complete  data  set  if 
M-estimation  is  adequately  adjusting  for  these  outliers. 

{Insert  Table  2] 

The  remaining  columns  of  Table  2  exhibit  least  squares  estimates 
and  M-estimates  for  each  of  the  predictor  variable  transformations 
which  were  discussed  in  Section  4.  Overafl,  these  "resistant" 
estimation  schemes  seem  to  perform  worse  than  just  using  M- 
estimation  on  the  (raw)  complete  data  set  predictor  variables, 
with  the  possible  exception  of  the  weighted  predictor  variables 
in  the  last  column.  These  estimates  (obtained  by  setting 
n *  h  =  .25  for  the  Lotus)  are  quite  similar  in  magnitude  to  the 
M-estimates  for  the  complete  data  set  but  several  of  the  coefficients 
are  not  significant  when  it  appears  they  should  be.  Regardless  of 
these  comparisons,  none  of  the  resistant  predictor  variable  transfor¬ 
mations  shows  substantial  improvement  over  M-estimation  using  the 
raw  predictor  variable. 

The  poor  performance  of  the  resistant  estimators  is 
attributable  in  part  to  a  strong  multicollinearity  among  the 
predictor  variables.  Inclusion  of  RATIO  with  HP  and  WT,  while 
seemingly  an  important  addition  to  the  set  of  predictor  variables, 
has  induced  a  three-variable  multicollinearity  of  the  form 

.53  RATIO  -  . 70HP  +  .45  WT  =  0. 

This  multicollinearity  is  detectable  from  the  latent  roots  and 
latent  vectors  of  XgXg  and  is  further  evidenced  by  the  variance 


inflation  factors  of  HP,  WT,  and  RATIO:  51.6,  23.3,  and  30.1, 
respectively.  Note  that  the  signs  on  the  variables  in  the 
above  multicollinearity  are  identical  with  the  signs  of  the 
least  squares  estimates  for  the  complete  data  set  in  Table  2. 
Elimination  of  the  Chrysler  and  the  Lotus  worsens  the  problem 
since  it  strengthens  the  multicollinearity.  The  smallest 
latent  root  of  X'XC  drops  form  0.0098  to  0.0024  for  the  base 
set  and  the  three  variance  inflation  factors  increase  to 
217.0,  60.5  and  143.7,  respectively.  Note  too  that  the  magni¬ 
tudes  of  the  coefficient  estimates  for  HP,  WT,  and  RATIO  all 
increase  when  the  two  outliers  are  removed  from  the  complete 
data  set  and  the  signs  of  the  coefficient  estimates  again  cor¬ 
respond  to  those  of  the  above  multicollinearity.  These  sign 
patterns  and  large  magnitudes  are  well-known  characteristics 
of  the  ill  effects  of  multicollinearities. 

Each  of  the  resistant  estimators,  despite  their  clear 
computational  differences,  either  maintains  or  strengthens  the 
multicollinearity  among  HP,  WT,  and  RATIO.  Due  to  the  tendency 
for  both  outliers  and  multicollinearities  to  distort  coefficient 
estimates,  it  would  be  fortuitous  if  any  of  the  estimates  in 
Table  2  were  to  accurately  reflect  the  true  relationship  between 
GPM  and  these  predictor  variables.  Since  the  multicollinearity 
is  not  outlier- induced,  one  cannot  expect  the  resistant  estimators 
to  overcome  the  ill  effects  of  the  multicollinearity;  indeed, 
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several  of  the  estimators  in  Table  2  exhibit  the  same  sign 
pattern  as  the  least  squares  estimates,  although  the  magnitudes 
for  the  multicollinear  predictor  variables  tend  to  be  smaller 
than  those  given  by  the  estimates  for  the  base  set. 

Two  changes  were  made  in  the  data  set  in  order  to  further 
examine  these  estimators.  First,  since  the  multicollinearity 
is  due  to  a  defined  relationship  between  three  predictor  variables, 
viz,  RATIO»*HP/WT ,  one  of  these  variables  can  be  eliminated  from 
the  data  set  without  seriously  impairing  prediction  of  GPM. 

RATIO  was  added  to  the  data  set  because  of  the  nature  of  the 
automobiles  which  are  included  in  Table  1  and  the  belief  that 
it  might  represent  an  important  characteristic  of  the  foreign 
sports  cars.  NT  shows  up  as  an  in¥>ortant  predictor  variable  in 
every  analysis  performed  on  these  data.  Consequently,  we  decided 
to  eliminate  HP  and  break  up  the  induced  multicollinearity.  An 
alternative  to  this  approach  would  be  to  retain  all  three  pre¬ 
dictor  vairables  and  combine  robust  and  biased  estimation  pro¬ 
cedures  (e.g.,  Askew  and  Montgomery  1980)  but  this  alternative 
is  beyond  the  scope  of  the  present  paper.  The  second  change 
made  in  the  data  set  was  to  increase  the  ratio  variable  on  the 
Lotus  Europa  from  75  to  200  in  order  to  accentuate  the  distortion 
it  causes  as  an  outlier. 

With  these  two  changes  in  the  data  set,  HP  removed  and 
the  Lotus'  RATIO  value  set  to  200,  the  outlier  statistics  for  the 
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complete  and  base  30  data  sets  are  as  shown  in  Table  3.  The 
only  large  leverage  value  in  the  complete  data  set  occurs  for 
the  Lotus  Europe .  The  largest  studentized  deleted  residual  is 
associated  with  the  Chrysler  Imperial  and  the  Lotus  and  the 
Chrysler  both  have  values  which  exceed  a  lower  25%  cutoff 
value  for  an  F(7,25)  distribution.  In  contrast,  the  outlier 
statistics  for  the  base  set  reveal  no  strong  indications  of 
outliers. 

[Insert  Table  3] 

Table  4  displays  the  least  squares  and  the  M-estimates  for 
the  same  estimators  as  in  Table  2.  Again  the  least  squares  esti¬ 
mates  and  the  M-estimates  for  the  base  set  are  quite  similar. 
Unlike  Table  2,  the  least  squares  estimates  and  the  M-estimates 
are  not  virtually  identical  for  the  conplete  data  set.  The  M- 
estimates  for  the  complete  data  set  are  closer  to  the  M-estimates 
for  the  base  set  than  are  the  least  squares  estimates.  With  HP 
removed,  the  M-estimates  do  appear  to  be  reducing  the  effect  of 
the  large  residuals.  The  two  observations  which  have  4>  (r^/a) 
values  less  than  1.0  in  the  last  iteration  of  eqn  (2.12)  cor¬ 
respond  to  the  Chrysler  Imperial  and  the  Pontiac  Firebird,  the 
two  observations  which  have  the  largest  t ,  , ,  values  in  the 
third  column  of  Table  3.  Interestingly,  the  Lotus  Europe  is  not 
weighted  by  M-estimation  on  the  conplete  data  set.  The  effect 
of  the  Lotus  on  the  coefficient  estimates  is  unaltered  by 
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M-estimation. 

[Insert  Table  4] 

The  remaining  estimators  in  Table  4  are  the  same  as  those  in 
Table  2  except  for  the  weighted  estimates  using  In 

studying  the  M-estimates  for  all  the  estimators  in  Tables  2  and 
4,  it  appeared  that  the  M-estimates  either  would  fail  to  weight 
or  would  not  adequately  weight  residuals  whose  leverage  values 
were  not  sufficiently  small,  even  if  the  residual  appeared  to  be 
large.  In  other  words,  if  a  residual  was  large  it  would  only 
get  weighted  by  M-estimation  if  its  leverage  value  was  suitably 
small.  Adequate  weighting  by  M-estimation  seemed  to  require 
that  the  leverage  value  be  no  larger  than  the  average  of  all 
the  leverage  values,  h =»  (p+l)/n.  In  applying  the  weighted 
estimator  (4.7)  we  decided  to  weight  the  rows  of  X  corresponding 
to  both  the  Chrysler  Imperial  and  the  Lotus  Europa  so  their 
leverage  values  would  equal  0.20  (h=0.22). 

The  first  three  "resistant"  estimators  shown  in  Table  4 
still  fail  to  improve  on  the  M-estimates  for  the  raw  predictor 
variables  in  the  complete  data  set.  Each  of  these  estimates 
attempts  to  adjust  for  outliers  by  modifying  the  observations 
on  a  single  predictor  variable  or  a  single  principal  component 
without  regard  to  the  values  on  the  other  predictor  variables 
or  prinicpal  components .  In  each  case  some  of  the  coefficient 
estimates  appear  to  be  close  to  those  of  the  base  set  but  others 
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are  not.  Since  the  predictor  variables  are  not  orthogonal, 
robust  regression  procedures  which  weight  variables  or  com¬ 
ponents  individually  might  not  only  be  incapable  of  adequately 
adjusting  for  outliers  but  they  could  also  further  distort 
the  estimates  by  changing  the  correlation  structure  among  the 
predictor  variables.  These  first  three  estimators  could  be 
suffering  such  a  problem. 

The  last  estimator  does  seem  to  adequately  adjust  for  the 
outliers  in  this  data  base.  After  the  predictor  variable  values 
for  the  Chrysler  and  the  Lotus  are  weighted  so  their  leverage 
values  are  approximately  0.20,  the  M-estimates  weight  the 
residuals  corresponding  to  both  of  these  observations  and 
the  Pontiac  Firebird.  The  restating  coefficient  estimates 
are  the  most  similar  to  the  base  set  in  Table  4. 

7.  CONCLUDING  REMARKS 

The  need  for  developing  regression  procedures  which  are 
both  robust  to  error  assumption  violations  and  resistant  to 
aberrant  predictor  variable  values  has  been  demonstrated  in 
the  theoretical  derivations  of  Sections  2  and  3  and  the  examples 
discussed  in  the  previous  section.  Further  studies  are  needed 
l  afore  any  of  the  procedures  discussed  in  this  paper  can  be  re¬ 
commended  for  general  use  but  the  example  suggests  several  im¬ 
portant  properties  which  good  resistant  estimators  should  possess. 
First,  they  must  be  able  to  adjust  for  outliers  in  the  predictor 
variables  without  substantially  altering  the  correlation 


structure  which  is  imposed  by  the  "non-outlier"  observations. 


Whether  weighting  schemes  which  operate  on  columns  of  X  rather 
than  its  rows  can  accomplish  such  an  adjustment  without  altering 
the  true  underlying  correlation  structure  remains  to  be  carefully 
investigated.  A  second  property  of  robust/resistant  estimators 
which  seems  desirable  is  that  the  estimators  should  be  capable  of 
weighting  large  residuals  even  if  in  the  raw  data  set  the  residuals 
cure  accompanied  by  large  leverage  values.  If  M-estimation  is  used 
on  the  residuals,  this  might  require  a  weighting  of  observations 
to  insure  that  leverage  values  are  sufficiently  small.  Small 
leverage  values  are  not  only  desirable  for  accurate  estimation 
but  also,  as  Huber  (1981,  Chapter  7)  proves,  required  for  asymptotic 
normality  of  the  estimators  for  nonnormal  errors. 

Rank  transforms  offer  another  possible  alternative  to  the 
procedures  studied  in  this  article.  Rank  transforms  have  been 
shown  by  Iman  and  Conover  (1979)  to  be  effective  robust  alternatives 
for  prediction  of  the  response  variable  but  not  necessarily  for 
parameter  estimation,  the  focus  of  this  paper. 
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TABLE  1.  OUTLIER  STATISTICS  FOR  GASOLINE  MILEAGE  DATA. 
(Prediction  of  GPM) 


Automobile 

Complete  Data 

Set 

Base 

Set  (n  * 

30) 

Type 

h.  . 

ii 

Vi) 

D. 

X 

hii 

Vi> 

D. 

X 

Mazda  RX-4 

.166 

.162 

.001 

.177 

.066 

.000 

Mazda  RX-4  Wagon 

.184 

-.258 

.002 

.187 

-.523 

.008 

Datsun  710 

.168 

.534 

.007 

.202 

.379 

.005 

Hornet  4  Drive 

.119 

-.677 

.008 

.125 

-.845 

.013 

Hornet  Sportabout 

.126 

-.994 

.018 

.134 

-1.231 

.029 

Valiant 

.224 

.377 

.005 

.228 

.508 

.010 

Duster  360 

.255 

.614 

.017 

.258 

.611 

.017 

Mercedes  240D 

.316 

-.329 

.007 

.346 

-.219 

.003 

Mercedes  230 

.276 

-.341 

.006 

.278 

-.651 

.021 

Mercedes  280 

.237 

-.107 

.000 

.238 

-.239 

.002 

Mercedes  280C 

.237 

.559 

.012 

.238 

.610 

.015 

Mercedes  450SE 

.080 

-.935 

.009 

.092 

-1.453 

.026 

Mercedes  450SL 

.092 

-.805 

.008 

.094 

-1.027 

.014 

Mercedes  450SLC 

.088 

.304 

.001 

.091 

.373 

.002 

Cadillac  Fleetwood 

.265 

2.342 

.208 

.358 

1.485 

.146 

Lincoln  Continental 

.318 

1.802 

.173 

.443 

.625 

.040 

Chrysler  Imperial 

.298 

-4.000 

.522 

Fiat  128 

.164 

-.809 

.016 

.243 

-.705 

.020 

Honda  Civic 

.381 

.413 

.014 

.423 

.927 

.079 

Toyota  Corolla 

.140 

-.500 

.005 

.145 

-.380 

.003 

Toyota  Corona 

.382 

.729 

.042 

.519 

.477 

.032 

Dodge  Challenger 

.215 

1.109 

.042 

.222 

1.933 

.119 

AMC  Javelin 

.162 

1.230 

.036 

.165 

1.798 

.073 

Camaro  Z-28 

.328 

.839 

.044 

.360 

.588 

.025 

Pontiac  Firebird 

.088 

-1.938 

.041 

.092 

-2.747 

.073 

Fiat  Xl-9 

.149 

.505 

.006 

.167 

1.033 

.027 

Porsche  914-2 

.170 

.474 

.006 

.193 

.623 

.012 

Lotus  Europa 

.872 

-1.394 

1.590 

Ford  Pantera  L 

.364 

.120 

.001 

.480 

-.532 

.034 

Ferrari  Dino  1973 

.236 

.288 

.003 

.425 

.128 

.002 

Maserati  Bora 

.681 

-.227 

.014 

.806 

.307 

.051 

Volvo  14 2E 

.217 

-.189 

.001 

.271 

-1.195 

.065 
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TABLE  2.  COMPARISON  OF  GPM  COEFFICIENT  ESTIMATES  (xlO~3) . 

Least  Squares  Estimates 


Predictor 

Variable 

Base  Set 

(n*  30) 

Complete 

Data  Set 

Huber's 

if»j  (•) 

Mallows 

v> 

Principal 

Component 

Weighted 

CYL 

-.958 

.459 

-.673 

4.181* 

.571 

.513 

HP 

-.309* 

-.071 

.242* 

-.193 

-.202 

-.046 

DRAT 

7.042* 

3.389* 

-3.222 

2.799 

2.083 

3.285 

WT 

31.664* 

17.747* 

6.768 

14.992 

24.000* 

16.766* 

AM 

7.553* 

4.468 

4.307 

3.009 

5.767 

4.450 

GEAR 

-8.966* 

-5.118* 

-2.280 

-7.505 

-2.225 

-4.936 

RATIO 

1.378* 

.491 

-.672* 

.127 

1.037 

.402 

Predictor 

Variable 

Base  Set 

(n  *  30) 

M-Estimates 

Complete  Huber ' s 

Data  Set  ( - ) 

Mallows 

v> 

Principal 

Component 

Weighted 

v> 

CYL 

-.782 

.078 

-.213 

4.181* 

.148 

-.013 

HP 

-.303* 

-.096 

.226* 

-.193 

-.234 

-.146 

DRAT 

6.960* 

4.794 

-2.235 

2.799 

3.838 

4.804 

WT 

31.278* 

20.855* 

5.607 

14.992 

27.280* 

22.543* 

AM 

7.523* 

5.941* 

2.710 

3.009 

7.165 

5.747 

GEAR 

-8.937* 

-6.371* 

-2.147 

-7.505 

-3.736 

-6.564* 

RATIO 

1.348* 

.588* 

-.582 

.127 

1.178* 

.769 

•Significant  at  an  a=  .20  (two-tailed)  level. 


TABLE  3.  OUTLIER  STATISTICS  FOR  ALTERED  GASOLINE  MILEAGE  DATA. 
(Prediction  of  GPM) 


Automobile 

Complete  Data 

Set 

Base 

Set 

(n  ■ 

30) 

Type 

h.  . 

IX 

Vi) 

D. 

l 

hii 

t 

(-i) 

D. 

X 

Mazda  RX-4 

.120 

-.055 

.000 

.154 

.289 

.002 

Mazda  RX-4  Wagon 

.123 

-.540 

.006 

.185 

- 

.447 

.007 

Datsun  710 

.156 

.725 

.014 

.186 

.559 

.011 

Hornet  4  Drive 

.120 

-.629 

.008 

.124 

- 

.772 

.012 

Hornet  Sportabout 

.125 

-.967 

.019 

.127 

-1 

.067 

.024 

Valiant 

.226 

.318 

.004 

.228 

.496 

.011 

Duster  360 

.127 

1.018 

.021 

.258 

.564 

.016 

Mercedes  240D 

.288 

-.568 

.019 

.296 

- 

.578 

.021 

Mercedes  230 

.262 

-.229 

.003 

.276 

- 

.723 

.029 

Mercedes  280 

.209 

-.327 

.004 

.238 

- 

.205 

.002 

Mercedes  280C 

.209 

.305 

.004 

.238 

.626 

.018 

Mercedes  450SE 

.073 

-1.025 

.012 

.089 

-1 

.507 

.030 

Mercedes  450SL 

.091 

-.841 

.010 

.092 

-1 

.063 

.016 

Mercedes  450SLC 

.086 

.224 

.001 

.089 

.294 

.001 

Cadillac  Fleetwood 

.245 

2.150 

.187 

.353 

1 

.581 

.183 

Lincoln  Continental 

j.303 

1.707 

.168 

.440 

.706 

.057 

Chrysler  Imperial 

.313 

-3.776 

.606 

Fiat  128 

.134 

-.977 

.021 

.138 

-1 

.149 

.030 

Honda  Civic 

.375 

.177 

.003 

.399 

.602 

.035 

Toyota  Corolla 

.136 

-.530 

.007 

.139 

- 

.482 

.006 

Toyota  Corona 

.226 

1.337 

.072 

.432 

.983 

.105 

Dodge  Challenger 

.194 

.795 

.022 

.220 

1 

.782 

.117 

AMC  Javelin 

.132 

.957 

.020 

.160 

1 

.873 

.086 

Camaro  Z-28 

.276 

1.178 

.075 

.348 

.375 

.011 

Pontiac  Firebird 

.082 

-1.996 

.045 

.090 

-2 

.724 

.082 

Fiat  Xl-9 

.134 

.393 

.004 

.136 

.717 

.012 

Porshe  914-2 

.177 

.384 

.005 

.180 

.777 

.019 

Lotus  Europa 

.921 

-1.520 

3.666 

Ford  Pantera  L 

.350 

.327 

.009 

.359 

.127 

.001 

Ferrari  Dino  1973 

.278 

.413 

.010 

.301 

.696 

.030 

Maserati  Bora 

.315 

.168 

.002 

.484 

- 

.912 

.112 

Volvo  142E 

.194 

.090 

.000 

.241 

- 

.854 

.034 
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TABLE  4.  COMPARISON  OF  GPM  COEFFICIENT  ESTIMATES  (xlO3) ,  ALTERED  DATA  BASE. 


Least  Squares  Estimates 


Predictor 

Base  Set 

Complete 

Huber's 

Mallows 

Principal 

Weighted 

Variable 

(n-  30) 

Data  Set 

v> 

Component 

v> 

CYL 

-.757 

2.251* 

.072 

4.478* 

2.307 

1.653 

DRAT 

4.865* 

2.917 

.917 

1.498 

1.339 

4.449 

WT 

19.312* 

13.895* 

22.114* 

13.048* 

16.052* 

16.407* 

AM 

7.053* 

4.825 

8.847* 

6.504 

5.447 

6.422 

GEAR 

-6.580* 

-2.478 

-3.043 

.003 

1.578 

-3.804 

RATIO 

.303* 

.067* 

.277 

-.150* 

.103 

.093 

M- 

-Estimates 

Predictor 

Base  Set 

Complete 

Huber's 

Mallows 

Principal 

Weighted 

Variable 

(n  =  30) 

Data  Set 

*j(‘) 

v> 

Component 

v> 

CYL 

-.584 

1.865* 

.888 

4.342* 

2.155* 

.807 

DRAT 

4.825* 

4.199 

1.548 

2.067 

3.006 

4.106 

WT 

19.164* 

15.944* 

18.151* 

14.083* 

17.918* 

16.853* 

AM 

7.032* 

6.180* 

5.696 

7.318 

6.216 

6.169* 

GEAR 

-6.598* 

-3.645 

-2.516 

-.366 

.881 

-4.680* 

RATIO 

.295* 

.083* 

.299 

-.147* 

.109* 

.179* 

•Significant  at  an  a =  .20  (two-tailed)  level. 
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